Homework-Project-2

Author

Cole Marshall

library(tokenizers)
library(tidytext)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(ggplot2)

Background

About the dataset: The table named Spotify Million Song Dataset has 57,650 rows and 4 columns, with column names ‘artist’, ‘song’, ‘link’, and ‘text’, all of which are of string type. It provides a comprehensive collection of data that can be analyzed to gain insights into various aspects of the songs in the Spotify library.

Questions

The following are 4 questions that will be explored in this report and the general approach taken to answer/explore them.

  1. What are the most common themes or topics discussed in song lyrics?

    • Analyze the ‘text’ column to identify recurring words, phrases, or themes in song lyrics. Use techniques like word frequency analysis, topic modeling, or sentiment analysis to uncover prevalent themes or emotions.
  2. Which artists have the largest vocabulary in their song lyrics?

    • Calculate the diversity of vocabulary for each artist by analyzing the unique words used in their song lyrics. Identify artists who use a wide range of vocabulary versus those who stick to a more limited set of words.
  3. Are there any trends or patterns in the sentiment of song lyrics over time or across genres?

    • Perform sentiment analysis on the lyrics to determine the overall sentiment (positive, negative, neutral) of songs. Explore if there are shifts in sentiment over different time periods or if certain genres tend to exhibit particular emotional tones.
  4. Which artists tend to use the most profanity?

    • To gauge profanity in song lyrics, a profanity list is compiled. Lyrics are broken into words or phrases, identifying profane instances through comparison. Tallying these instances, counts are normalized for fair assessment. Trends across artists, genres, or time periods are then analyzed, considering profanity’s subjective nature and cultural context. This approach unveils differences in profanity usage among artists.

Importing Data

# Read from csv into object
df_songs <- read.csv("Spotify Million Song Dataset_exported.csv")

Cleaning Data

We will remove the links associated with each of the songs as they are not important to the questions that we are trying to answer in this report. Additionally, we are creating a new identifier for each of the songs that corresponds to the song number in df_songs

# Remove link variable
df_songs$link <- NULL

# add a song number (identifier) so that songs can be tracked between objects

# Add a new column with row numbers in the desired format
df_songs <- df_songs %>%
  mutate(song_num = paste0("song_", row_number()))

General Data Wrangling

The provided R code forms part of a text processing pipeline designed to analyze song lyrics. It initializes a list named list_songs to store processed word objects, likely representing individual songs. The code then iterates over each row of a DataFrame named df_songs, assumed to contain song data. Within each iteration, the lyrics from the third column of the DataFrame are read and processed. Initially converted to character type, the lyrics are subsequently tokenized into words using the unnest_tokens function. Stopwords, common non-informative words like “the” and “and”, are then removed from the tokens using an anti-join operation with a stopword list. The resulting processed word objects are stored in the list_songs list, indexed with names based on the iteration index. Finally, the entire list, containing processed data for each song, is saved as an RDS file named “list_songs”. This code snippet demonstrates a systematic approach to preprocess song lyrics for further analysis or exploration.

# Initialize a list to store unnested word objects
#list_songs <- list()

# Iterate over each row of the dataframe
#for (i in 1:nrow(df_songs)) {
  # Read the text from the 3rd column
  #songs_text <- df_songs[i, 3]
  
  # Convert to character
  #songs_text <- as.character(songs_text)
  
  # Create a dataframe with the text
  #df_song <- data.frame(text = songs_text)
  
  # Unnest tokens
  #post_unnested <- unnest_tokens(df_song, word, text)
  
  # Antijoin with Stopwords for all objects in list
  #post_unnested_filtered <- anti_join(post_unnested, stop_words, by = "word")
  
  # Store the unnested word object in the list
  #list_songs[[paste0("song_", i)]] <- post_unnested_filtered
#}

#saveRDS(list_songs, "list_songs")

list_songs <- readRDS("list_songs")

Question 1: What are the most common themes or topics discussed in song lyrics?

For this, we can perform a word frequency analysis to identify recurring words or phrases in the song lyrics.

# Combine all words from all songs into a single dataframe
all_words <- bind_rows(list_songs)

# Perform word frequency analysis
word_freq <- all_words %>%
  count(word, sort = TRUE)

# Visualize the top 20 most common words
top_words <- word_freq %>%
  slice_max(n = 20, order_by = n)

# Plotting the top words
ggplot(top_words, aes(x = reorder(word, n), y = n)) +
  geom_col(fill = "skyblue") +
  labs(title = "Top 20 Most Common Words in Song Lyrics",
       x = "Word",
       y = "Frequency") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Question 2: Which artists have the largest vocabulary in their song lyrics?

We can calculate the diversity of vocabulary for each artist by analyzing the unique words used in their song lyrics.

#artist_dfs <- list()

# Get unique artists from df_songs
#unique_artists <- unique(df_songs$artist)

# Loop through each artist
#for (artist in unique_artists) {
  # Filter rows where artist is the current artist
  #artist_songs <- df_songs[df_songs$artist == artist, ]
  
  # Combine text for the current artist
  #artist_text <- paste(artist_songs$text, collapse = ' ')
  
  # Create DataFrame for the current artist
  #artist_df <- data.frame(artist = artist, text_combined = artist_text)
  
  # Append the dataframe to the list
  #artist_dfs[[artist]] <- artist_df
#}

# Combine all artist dataframes into a single dataframe
#all_artists_df <- do.call(rbind, artist_dfs)

# Print the combined dataframe
#print(all_artists_df)

#saveRDS(all_artists_df, file = "all_artists_df")
  
all_artists_df <- readRDS("all_artists_df")
# Initialize an empty list to store tokenized words for each artist
#tokenized_words_list <- list()

# Loop through each artist in the dataframe
#for (artist in unique(all_artists_df$artist)) {
  # Filter dataframe for the current artist
  #artist_df <- all_artists_df[all_artists_df$artist == artist, ]
  
  # Tokenize the combined lyrics for the current artist
  #tokenized_words <- unlist(tokenize_words(artist_df$text_combined))
  
  # Add tokenized words to the list
  #tokenized_words_list[[artist]] <- tokenized_words
#}

#saveRDS(tokenized_words_list, file = "tokenized_words_list")

tokenized_words_list <- readRDS("tokenized_words_list")
# Initialize an empty dataframe to store vocabulary sizes
vocabulary_df <- data.frame(artist = character(), vocabulary_size = numeric())

# Loop through each artist in the tokenized words list
for (artist in names(tokenized_words_list)) {
  # Get tokenized words for the current artist
  tokenized_words <- tokenized_words_list[[artist]]
  
  # Calculate vocabulary size (number of unique words)
  vocabulary_size <- length(unique(tokenized_words))
  
  # Add artist and vocabulary size to the dataframe
  vocabulary_df <- rbind(vocabulary_df, data.frame(artist = artist, vocabulary_size = vocabulary_size))
}

# Sort the dataframe by vocabulary size
vocabulary_df <- vocabulary_df[order(vocabulary_df$vocabulary_size, decreasing = TRUE), ]

# Print the dataframe to see which artists have the biggest and smallest vocabularies
print(vocabulary_df)
                                          artist vocabulary_size
303                                    LL Cool J            6677
224                           Insane Clown Posse            6362
297                                    Lil Wayne            6316
139                                     Fabolous            6081
582                                 Wu-Tang Clan            5603
120                                       Eminem            5262
252                                Joni Mitchell            5230
262                                   Kanye West            5043
210                                     Ice Cube            5015
41                                     Bob Dylan            4921
243                                Jimmy Buffett            4821
565                            Weird Al Yankovic            4721
186                                   Gucci Mane            4702
108                                        Drake            4646
317                                    Marillion            4616
354                                    Nick Cave            4554
387                                      Outkast            4514
448                        Red Hot Chili Peppers            4387
221                                 Indigo Girls            4265
406                                        Phish            4241
357                                  Nicki Minaj            4235
473                                   Snoop Dogg            4234
462                                         Rush            4231
51                             Bruce Springsteen            4188
596                                       Xzibit            4161
31                               Beautiful South            4137
307                                     Lou Reed            4105
273                                     Kid Rock            4040
24                                Arrogant Worms            3982
419                                   Puff Daddy            3977
118                               Elvis Costello            3962
364                                         NOFX            3952
456                              Robbie Williams            3940
396                                   Paul Simon            3922
249                                   John Prine            3921
476                                      Squeeze            3907
366                             Notorious B.I.G.            3891
251                                  Johnny Cash            3869
234                                       J Cole            3820
73                                         Clash            3785
86                                   David Bowie            3765
278                                        Kinks            3753
621                                         Z-Ro            3748
11                                  Alice Cooper            3744
436                                     R. Kelly            3739
327                                     Megadeth            3730
135                                     Everlast            3697
595                                          XTC            3692
212                                     Iggy Pop            3688
471                                       Slayer            3672
257                                 Judas Priest            3663
512                               Tragically Hip            3622
205                           Horrible Histories            3611
228                                  Iron Maiden            3608
65                                   Chris Brown            3604
117                                   Elton John            3601
180                             Gordon Lightfoot            3575
329                                    Metallica            3573
3                                   Adam Sandler            3560
457                                  Rod Stewart            3548
5                                      Aerosmith            3532
505                                  Tom T. Hall            3524
192                            Hank Williams Jr.            3518
37                                    Billy Joel            3503
573                                          Who            3493
182                                Grateful Dead            3438
270                                Kenny Chesney            3433
411                                       Pogues            3431
155                                  Frank Zappa            3428
237                                 James Taylor            3428
100                                 Dolly Parton            3355
399                                Pet Shop Boys            3350
198                                    Helloween            3346
34                                  Bette Midler            3335
244                                  John Denver            3330
218                                      Incubus            3324
615                                  Young Jeezy            3312
519                                         UB40            3307
425                                        Queen            3296
248                              John Mellencamp            3294
394                                  Patti Smith            3286
185                                    Green Day            3277
78                                Counting Crows            3276
458                               Rolling Stones            3276
85                               David Allan Coe            3260
397                                    Pearl Jam            3258
264                                    Kate Bush            3255
318                               Marilyn Manson            3250
295                                Leonard Cohen            3237
109                                Dream Theater            3215
347                                     Nazareth            3211
8                                        Alabama            3199
299                                  Linkin Park            3175
167                                      Genesis            3169
315                                 Mariah Carey            3161
121                               Emmylou Harris            3155
388                                     Overkill            3152
175                                         Glee            3151
91                                   Deep Purple            3146
340                                    Morrissey            3146
44                                     Bob Seger            3126
284                           Kris Kristofferson            3115
45                                      Bon Jovi            3113
69                               Christmas Songs            3111
287                                    Lady Gaga            3109
239                                   Jason Mraz            3108
169                              George Harrison            3094
70                                 Christy Moore            3092
28                              Barbra Streisand            3084
190                                    Hank Snow            3070
351                                   Neil Young            3062
401                                Peter Gabriel            3055
55                                   Carly Simon            3052
348                                        Ne-Yo            3049
362                       Nitty Gritty Dirt Band            3019
235                               Jackson Browne            3018
332                              Michael Jackson            3016
38                                   Bing Crosby            2996
443                                 Randy Travis            2992
82                                  Cyndi Lauper            2985
390                                Ozzy Osbourne            2982
501                                   Tim McGraw            2980
508                                    Tori Amos            2978
455                                      Rihanna            2974
151                                     Flo-Rida            2966
259                                 Judy Garland            2962
439                     Rage Against The Machine            2940
415                                       Primus            2935
373                                 Oingo Boingo            2926
408                                      Pitbull            2923
74                                 Cliff Richard            2922
29                                    Beach Boys            2910
283                                         Korn            2908
483                                        Sting            2906
536                                        Usher            2905
67                            Christina Aguilera            2903
23                                  Arlo Guthrie            2882
482                                Stevie Wonder            2872
407                                   Pink Floyd            2856
559                                 Warren Zevon            2854
260                                Justin Bieber            2850
410                                         P!nk            2848
518                                           U2            2844
184                                Great Big Sea            2837
416                                       Prince            2834
363                                          Noa            2832
104                                 Donna Summer            2823
196                           Harry Connick, Jr.            2817
132                                   Eurythmics            2812
583                                  Wyclef Jean            2811
62                                          Cher            2810
191                                Hank Williams            2805
486                                         Styx            2789
84                            Dave Matthews Band            2786
540                                 Van Morrison            2771
316                           Marianne Faithfull            2770
16                                       America            2768
172                                George Strait            2767
335                                  Miley Cyrus            2762
321                                   Mary Black            2742
71                                   Chuck Berry            2738
15                                    Alphaville            2731
272                                 Kenny Rogers            2731
333                             Michael W. Smith            2725
115                              Ella Fitzgerald            2719
63                                       Chicago            2709
447                                Reba Mcentire            2705
39                                 Black Sabbath            2694
546                                        Venom            2684
40                                          Blur            2682
92                                   Def Leppard            2679
94                                  Depeche Mode            2677
442                                      Ramones            2674
479                                   Steely Dan            2668
344                                 Natalie Cole            2658
603                                          Yes            2658
446                                  Ray Charles            2656
59                                   Celine Dion            2655
154                                Frank Sinatra            2654
359                                  Nina Simone            2652
506                                    Tom Waits            2637
437                                    Radiohead            2616
60                                    Chaka Khan            2613
543                                  Vanilla Ice            2607
95                                          Devo            2602
203                                      Hollies            2602
389                                     Owl City            2601
638                                       ZZ Top            2591
308                              Louis Armstrong            2587
371                                    Offspring            2579
427                                Queen Latifah            2565
288                                 Lana Del Rey            2553
575                                   Will Smith            2553
292                                  Leann Rimes            2551
119                                Elvis Presley            2550
17                                     Amy Grant            2549
145                                 Fall Out Boy            2548
574                             Widespread Panic            2534
176                                Glen Campbell            2532
306                                 Loretta Lynn            2532
330                               Michael Bolton            2525
240                               Jennifer Lopez            2521
498                                   Thin Lizzy            2521
280                               Kirsty Maccoll            2520
89                                   Dean Martin            2518
510                             Townes Van Zandt            2516
562                              Waylon Jennings            2516
265                                   Katy Perry            2512
290                                  Lauryn Hill            2510
50                                Britney Spears            2508
97                                    Diana Ross            2503
188                                Guns N' Roses            2503
312                               Lynyrd Skynyrd            2493
576                                Willie Nelson            2487
236                                      The Jam            2485
418                                 Procol Harum            2482
171                               George Michael            2478
242                                 Jimi Hendrix            2477
114                     Electric Light Orchestra            2473
349                                 Neil Diamond            2472
395                               Paul McCartney            2470
47                                  Bonnie Raitt            2469
271                                Kenny Loggins            2466
286                                Kylie Minogue            2465
414                                   Pretenders            2463
268                               Kelly Clarkson            2454
293                                Lenny Kravitz            2445
209                                   Ian Hunter            2440
170                                 George Jones            2438
281                                         Kiss            2434
376                           Olivia Newton-John            2434
313                                      Madonna            2426
207                                 Howard Jones            2424
534                                   Uriah Heep            2424
152                                 Foo Fighters            2423
627                                    Zebrahead            2423
83                                 Dan Fogelberg            2421
326                                    Meat Loaf            2399
164                                 Garth Brooks            2398
537                                 Utada Hikaru            2394
102                                   Don McLean            2383
127                                      Erasure            2380
66                                     Chris Rea            2364
358                                    Nightwish            2363
398                                   Perry Como            2349
336                                      Misfits            2345
338                                  The Monkees            2345
509                                         Toto            2345
106                                        Doors            2342
77                                 Conway Twitty            2341
339                                  Moody Blues            2339
247                               John McDermott            2338
429                                  Queensryche            2338
1                                           ABBA            2334
551                                   Vince Gill            2334
30                                   The Beatles            2333
444                                Rascal Flatts            2331
187                             Guided By Voices            2328
539                                    Van Halen            2328
560                                     W.A.S.P.            2328
563                                         Ween            2324
110                            Dusty Springfield            2323
134                                    Everclear            2320
581                                  Wiz Khalifa            2320
503                                    Tom Jones            2316
25                                 Avril Lavigne            2314
26                               Backstreet Boys            2313
413                                       Poison            2292
61                                   Cheap Trick            2289
13                                 Alison Krauss            2281
193                                       Hanson            2275
173                                Gino Vannelli            2271
343                                Nat King Cole            2265
403                            Pharrell Williams            2251
298                               Linda Ronstadt            2249
267                                  Keith Urban            2244
466                                    Scorpions            2243
350                                  Neil Sedaka            2237
405                             Phineas And Ferb            2236
520                                          Ufo            2236
478                                   Status Quo            2233
142                                   Faith Hill            2231
412                               Point Of Grace            2231
33                                      Bee Gees            2227
490                                Talking Heads            2220
226                                         INXS            2218
538                                       Utopia            2216
197                                        Heart            2213
452                               Reo Speedwagon            2213
277                                 King Diamond            2206
453                                 Richard Marx            2196
369                                        Oasis            2195
36                                Billie Holiday            2194
150                                Fleetwood Mac            2191
165                                   Gary Numan            2191
459                                      Roxette            2191
245                                  John Legend            2187
255                                      Journey            2185
605                              Ying Yang Twins            2185
311                              Luther Vandross            2183
9                           Alan Parsons Project            2177
46                                      Boney M.            2177
555                                  Vybz Kartel            2176
561                                    Waterboys            2175
461                                  Roy Orbison            2167
98                                  Dire Straits            2163
493                              The Temptations            2149
6                                     Air Supply            2147
564                                       Weezer            2140
356                                   Nickelback            2139
449                             Regine Velasquez            2139
19                                 Andy Williams            2136
274                                  The Killers            2135
43                                    Bob Rivers            2128
254                                  Josh Groban            2124
502                                  Tina Turner            2122
12                               Alice In Chains            2121
511                                Tracy Chapman            2113
557                                Wanda Jackson            2113
392                                  Pat Benatar            2107
105                              Doobie Brothers            2100
177                               Gloria Estefan            2091
487                                      Sublime            2085
320                                     Maroon 5            2082
261                            Justin Timberlake            2080
143                                Faith No More            2057
269                                 Kelly Family            2054
58                                   Cat Stevens            2053
314                                      Manowar            2053
440                                      Rainbow            2046
57                                    Carpenters            2042
361                                      Nirvana            2042
64                                      Children            2039
80                                 Crowded House            2034
279                                Kirk Franklin            2031
14                          Allman Brothers Band            2028
491                                 Taylor Swift            2021
513                                        Train            2018
250                                   John Waite            2007
331                                Michael Buble            2007
113                                Eddie Cochran            1998
93                                   Demi Lovato            1993
304                                   Lloyd Cole            1975
374                                     Old 97's            1974
294                                    Leo Sayer            1973
566                                     Westlife            1962
445                                    Ray Boltz            1959
381                                        Opeth            1957
229                                Irving Berlin            1952
417                                  Proclaimers            1941
450                              Religious Music            1939
460                                   Roxy Music            1924
579                                 Wishbone Ash            1923
480                            Steve Miller Band            1916
208                                 Human League            1906
128                                 Eric Clapton            1901
54                                          Cake            1892
504                                   Tom Lehrer            1892
365                                  Norah Jones            1886
492                              Tears For Fears            1885
112                                   Ed Sheeran            1883
420                                        Q-Tip            1880
606                             Yngwie Malmsteen            1877
195                              Harry Belafonte            1867
368                                       O.A.R.            1865
552                               Violent Femmes            1862
360                              Nine Inch Nails            1860
370                           Ocean Colour Scene            1856
607                                     Yo Gotti            1853
553                                Virgin Steele            1850
385                               Our Lady Peace            1842
572                              Whitney Houston            1841
138                                      Extreme            1837
464                                      Santana            1837
75                                      Coldplay            1835
291                                  Lea Salonga            1834
107                                    Doris Day            1832
241                                    Jim Croce            1819
296                               Les Miserables            1818
276                                 King Crimson            1812
523                                     Ultravox            1805
489                             System Of A Down            1803
488                                   Supertramp            1799
609                                     Yoko Ono            1794
612                                     You Am I            1791
52                                    Bruno Mars            1787
149                                  Fiona Apple            1787
352                                    New Order            1786
616                                  Youngbloodz            1784
300                                Lionel Richie            1774
631                                 Ziggy Marley            1774
404                                 Phil Collins            1773
166                              Gary Valenciano            1771
202                                          HIM            1766
238                                 Janis Joplin            1764
469                                          Sia            1762
122                        Engelbert Humperdinck            1760
162                                    Freestyle            1745
379                                One Direction            1745
522                           Ultramagnetic Mc's            1742
116                               Ellie Goulding            1741
101                                   Don Henley            1732
342                                      'n Sync            1714
514                                       Travis            1709
310                                   Lucky Dube            1703
524                                Uncle Kracker            1700
599                                     Yelawolf            1698
472                                       Smiths            1694
42                                    Bob Marley            1691
124                             Enrique Iglesias            1679
428                      Queens Of The Stone Age            1679
474                                  Soundgarden            1675
201                              Hillsong United            1668
613                                   Young Buck            1664
153                                    Foreigner            1661
181                          Grand Funk Railroad            1653
266                                  Keith Green            1653
258                                        Judds            1647
500                                  Tim Buckley            1639
2                                    Ace Of Base            1634
570                            The White Stripes            1633
571                                   Whitesnake            1633
90                                         Death            1621
372                                    Ofra Haza            1610
378                                          Omd            1609
230                               Isley Brothers            1605
384                                 Otis Redding            1599
130                                   Etta James            1585
81                                  Culture Club            1584
337                               Modern Talking            1579
535                                         Used            1579
485                          Stone Temple Pilots            1557
140                                 Face To Face            1546
200                                     Hillsong            1538
432                                   Quiet Riot            1526
275                                    Kim Wilde            1521
600                                        Yello            1520
355                                   Nick Drake            1500
507                                         Tool            1497
346                            Natalie Imbruglia            1486
49                                         Bread            1470
345                                Natalie Grant            1463
391                                    Passenger            1455
580                            Within Temptation            1455
146                                     Fastball            1454
211                                 Idina Menzel            1452
441                                    Rammstein            1448
526                                    Underoath            1448
147                                  Fatboy Slim            1437
527                                   Underworld            1426
220                        Indiana Bible College            1424
541                             Vanessa Williams            1422
21                                 Ariana Grande            1419
515                            Twenty One Pilots            1418
601                                   Yellowcard            1417
20                                         Annie            1416
256                                 Joy Division            1416
131                                       Europe            1413
402                                   Peter Tosh            1411
554                                Vonda Shepard            1409
325                                    Mc Hammer            1403
548                             Vertical Horizon            1394
544                           Velvet Underground            1391
377                                    Olly Murs            1390
323                                  Matt Redman            1388
334                                        Migos            1386
400                                 Peter Cetera            1383
451                                          Rem            1367
386                                  Out Of Eden            1360
217                                    Incognito            1352
246                                  John Martyn            1336
499                                      Tiffany            1330
133                                  Evanescence            1328
4                                          Adele            1316
475                               Spandau Ballet            1316
48                                        Bosson            1314
496                                   The Script            1309
618                                     Yukmouth            1303
87                                  David Guetta            1302
141                                        Faces            1302
178                                Gloria Gaynor            1302
53                                   Bryan White            1295
610                                Yolanda Adams            1295
285                                         Kyla            1291
72                                    Cinderella            1282
567                                  Wet Wet Wet            1282
222                            Ingrid Michaelson            1279
375                                       Oliver            1276
393                                  Patsy Cline            1262
569                                  Whiskeytown            1262
528                                      Unearth            1261
204                                 Hooverphonic            1257
525                                 Uncle Tupelo            1241
328                                  Men At Work            1238
18                                Andrea Bocelli            1226
157                                Frankie Laine            1223
619                                     Yung Joc            1222
549                                  Veruca Salt            1219
183                                       Grease            1213
521                                 Ugly Kid Joe            1209
484                                  Stone Roses            1208
608                                  Yo La Tengo            1207
111                                       Eagles            1200
622                               Zac Brown Band            1192
533                                Unwritten Law            1191
633                                      Zoegirl            1186
532                                       Unseen            1183
467                                        Selah            1181
468                                 Selena Gomez            1180
289                              Lata Mangeshkar            1177
465                                Savage Garden            1164
309                                 Louis Jordan            1162
301                                   Little Mix            1139
79                  Creedence Clearwater Revival            1124
168                                George Formby            1124
214                              Imagine Dragons            1121
129                                  Erik Santos            1112
383                            Oscar Hammerstein            1105
604                                           YG            1100
598                              Yeah Yeah Yeahs            1095
586                                     X-Raided            1092
156                    Frankie Goes To Hollywood            1082
454                                  Rick Astley            1080
426                                Queen Adreena            1072
550                                        Verve            1071
614                                    Young Dro            1064
636                                     Zucchero            1058
433                                   Quietdrive            1055
594                                       Xscape            1042
409                                Planetshakers            1035
10                                    Aled Jones            1030
382                                Orphaned Land            1022
497                                   The Weeknd            1022
380                                  OneRepublic            1021
624                                          Zao            1020
158                                Frankie Valli            1018
477                                     Starship            1018
591                                  Xavier Rudd            1017
161                                         Free            1003
123                                       Enigma             992
99                                        Divine             990
194                                Happy Mondays             987
558                                   Wang Chung             986
163                                         Fun.             983
435                                  Quincy Punx             981
322                                   Matt Monro             979
7                                  Aiza Seguerra             977
206                                 Housemartins             972
495                                The Broadways             963
216                                    Imperials             957
422                                     Quarashi             957
144                                        Falco             956
35                                  Bill Withers             944
125                                         Enya             940
22                                  Ariel Rivera             936
68                               Christina Perri             931
324                                   Mazzy Star             914
494                              Ten Years After             911
96                                       Dewa 19             904
530                                        Unkle             892
568                                        Wham!             887
233                                    Iwan Fals             877
438                                        Raffi             869
577                              Wilson Phillips             868
213                                      Il Divo             865
319                                  Mark Ronson             864
593                                      Xiu Xiu             854
424                                        Quasi             842
76                                   Cole Porter             828
589                                      Xandria             827
189                                    Halloween             800
27                                        Barbie             798
587                                   X-Ray Spex             798
635                                          Zox             795
341                                          Mud             794
423                                 Quarterflash             790
434                                 Quincy Jones             790
56                                  Carol Banawa             789
32                          Beauty And The Beast             788
629                                       Zero 7             778
199                          High School Musical             777
584                                            X             777
103                                     Don Moen             775
597                                        Yazoo             773
225                                   Inside Out             771
253                               Jose Mari Chan             771
305                                        Lorde             767
263                                    Kari Jobe             759
630                                   Zeromancer             734
481                           Stevie Ray Vaughan             725
578                               Wilson Pickett             720
623                                   Zakk Wylde             718
160                                 Freddie King             703
617                               Youth Of Today             703
215                                        Imago             690
223                                         Inna             689
148                                Fifth Harmony             686
585                                      X Japan             680
542                                     Vangelis             666
232                              Israel Houghton             665
430                                    Quicksand             664
626                                        Zebra             660
641                      Van Der Graaf Generator             658
219                             Independence Day             653
159                              Freddie Aguilar             644
463                                    Sam Smith             636
88                                David Pomeranz             635
625                                   Zayn Malik             633
516                                     U. D. O.             625
620                                  Yusuf Islam             604
431                Quicksilver Messenger Service             597
639 Joseph And The Amazing Technicolor Dreamcoat             589
637                                         Zwan             577
421                                        Qntal             567
611                  Yonder Mountain String Band             546
231                                       Israel             539
353                               Next To Normal             499
602                             Yeng Constantino             493
470                                       Side A             464
367                                       O-Zone             460
545                                    Vengaboys             460
592                                      Xentrix             458
556                                Walk The Moon             454
179                                          GMB             453
547                                    Vera Lynn             424
227                               Iron Butterfly             422
634                                       Zornik             383
302                                Little Walter             381
136                                          Exo             375
590                                Xavier Naidoo             345
137                                        Exo-K             315
282                                    Koes Plus             299
531                                      Unknown             287
640                                  Soundtracks             238
642                              Various Artists             207
174                                  Gipsy Kings             195
643                                        Zazie             182
126                                Eppu Normaali             154
529                                         Ungu              86
628                                          Zed              77
517                                       U-Kiss              76
632                                          Zoe              56
588                                      X-Treme              37

This code initializes an empty dataframe called vocabulary_df to store the vocabulary sizes of different artists. Then, it iterates through each artist in the tokenized_words_list object. For each artist, it calculates the vocabulary size (number of unique words) from their tokenized words and appends the artist name along with their vocabulary size to the vocabulary_df dataframe using rbind(). After looping through all artists, the dataframe is sorted in descending order based on vocabulary size. Finally, the sorted dataframe is printed to identify which artists have the biggest and smallest vocabularies.

In the vocabulary_df dataframe, each row represents an artist, with the columns containing the artist’s name (artist) and their corresponding vocabulary size (vocabulary_size). To reference specific data in the vocabulary_df object, you can use inline code like this: vocabulary_df$column_name. For example, to reference the vocabulary size of the first artist in the dataframe, you would use vocabulary_df$vocabulary_size[1].

#Smallest and Largest
top5_largest <- vocabulary_df[1:5, ]
top5_smallest <- vocabulary_df[(nrow(vocabulary_df) - 4):nrow(vocabulary_df), ]

# Combine the top 5 largest and smallest dataframes
combined_df <- rbind(top5_largest, top5_smallest)

# Add a column to indicate whether the artist has the largest or smallest vocabulary
combined_df$category <- ifelse(combined_df$artist %in% top5_largest$artist, "Largest", "Smallest")

# Reorder the artists within each category
combined_df$artist <- factor(combined_df$artist, levels = combined_df$artist[order(combined_df$category, combined_df$vocabulary_size)])

# Plot the combined data
ggplot(combined_df, aes(x = reorder(artist, vocabulary_size), y = vocabulary_size)) +
  geom_bar(stat = "identity", aes(fill = category)) +
  facet_wrap(~ category, scales = "free", nrow = 1) +
  labs(title = "Top 5 Artists with Largest and Smallest Vocabulary",
       x = "Artist",
       y = "Vocabulary Size",
       fill = "Category") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The interactive scatter plot visualizes the relationship between the number of songs written by each artist and the length of their combined lyrics. Each point on the plot represents an artist, with the x-axis denoting the number of songs written and the y-axis representing the length of combined text. Hovering over a data point reveals the corresponding artist’s name, providing additional context. This interactive visualization allows for easy exploration of the dataset, enabling users to identify patterns and outliers among artists in terms of their productivity and the amount of lyrical content they’ve produced. It offers a dynamic way to analyze and interpret the relationship between songwriting activity and the quantity of text generated by different artists in the dataset.

unique_words <- aggregate(text_combined ~ artist, data = all_artists_df, FUN = function(x) length(unique(unlist(strsplit(x, "\\s+")))))

# Compute the frequency of each artist in df_songs$artist
artist_counts <- as.data.frame(table(df_songs$artist), stringsAsFactors = FALSE)
colnames(artist_counts) <- c("artist", "num_songs")

# Merge artist counts with unique_words dataframe
unique_words <- merge(unique_words, artist_counts, by = "artist", all.x = TRUE)

# View the updated unique_words dataframe
print(unique_words)
                                          artist text_combined num_songs
1                                        'n Sync          2828        93
2                                           ABBA          3224       113
3                                    Ace Of Base          2317        74
4                                   Adam Sandler          5135        70
5                                          Adele          1835        54
6                                      Aerosmith          5038       171
7                                     Air Supply          3170       174
8                                  Aiza Seguerra          1202        25
9                                        Alabama          4525       187
10                          Alan Parsons Project          2886       102
11                                    Aled Jones          1275        23
12                                  Alice Cooper          5237       174
13                               Alice In Chains          2957        95
14                                 Alison Krauss          3205       145
15                          Allman Brothers Band          3186       116
16                                    Alphaville          3533       105
17                                       America          3931       184
18                                     Amy Grant          3753       147
19                                Andrea Bocelli          1529        25
20                                 Andy Williams          2994       138
21                                         Annie          1989        32
22                                 Ariana Grande          2032        51
23                                  Ariel Rivera          1175        19
24                                  Arlo Guthrie          3838       113
25                                Arrogant Worms          5323        89
26                                 Avril Lavigne          3579       143
27                               Backstreet Boys          3792       164
28                                        Barbie          1073        18
29                              Barbra Streisand          4566       157
30                                    Beach Boys          4204       151
31                               Beautiful South          5306       149
32                          Beauty And The Beast          1062        12
33                                      Bee Gees          3529       170
34                                  Bette Midler          5571       158
35                                  Bill Withers          1267        35
36                                Billie Holiday          2954       150
37                                    Billy Joel          4664       141
38                                   Bing Crosby          4184       157
39                                 Black Sabbath          4066       156
40                                          Blur          3512       136
41                                     Bob Dylan          7384       188
42                                    Bob Marley          2806        86
43                                    Bob Rivers          2739        48
44                                     Bob Seger          4214       158
45                                      Bon Jovi          4549       181
46                                      Boney M.          3040        98
47                                  Bonnie Raitt          3619       149
48                                        Bosson          1800        52
49                                         Bread          2052        75
50                                Britney Spears          4198       158
51                             Bruce Springsteen          5731       175
52                                    Bruno Mars          2748        70
53                                   Bryan White          1713        48
54                                          Cake          2403        73
55                                   Carly Simon          4058       168
56                                  Carol Banawa           987        23
57                                    Carpenters          2872       112
58                                   Cat Stevens          2988       114
59                                   Celine Dion          3545       116
60                                    Chaka Khan          3873       186
61                                   Cheap Trick          3534       170
62                                          Cher          3886       187
63                                       Chicago          4050       160
64                                      Children          3233        90
65                                   Chris Brown          5961       145
66                                     Chris Rea          3214       182
67                            Christina Aguilera          4553       146
68                               Christina Perri          1241        36
69                               Christmas Songs          4502       140
70                                 Christy Moore          3968        64
71                                   Chuck Berry          3859       127
72                                    Cinderella          1652        48
73                                         Clash          4884       118
74                                 Cliff Richard          4311       184
75                                      Coldplay          2596       120
76                                   Cole Porter           991        17
77                                 Conway Twitty          3284       162
78                                Counting Crows          4715       158
79                  Creedence Clearwater Revival          1597        43
80                                 Crowded House          2571        76
81                                  Culture Club          2202        76
82                                  Cyndi Lauper          4243       161
83                                 Dan Fogelberg          3120       108
84                            Dave Matthews Band          4336       168
85                               David Allan Coe          4390       162
86                                   David Bowie          5354       165
87                                  David Guetta          1922        63
88                                David Pomeranz           798        17
89                                   Dean Martin          3405       186
90                                         Death          2080        60
91                                   Deep Purple          4356       179
92                                   Def Leppard          3858       147
93                                   Demi Lovato          3005       105
94                                  Depeche Mode          3408       167
95                                          Devo          3320       125
96                                       Dewa 19          1067        21
97                                    Diana Ross          3545       167
98                                  Dire Straits          2646        61
99                                        Divine          1363        33
100                                 Dolly Parton          4394       180
101                                   Don Henley          2215        46
102                                   Don McLean          3108        67
103                                     Don Moen          1022        39
104                                 Donna Summer          4092       191
105                              Doobie Brothers          2916       110
106                                        Doors          3451        97
107                                    Doris Day          2465        74
108                                        Drake          7084       117
109                                Dream Theater          4528        97
110                            Dusty Springfield          3455       175
111                                       Eagles          1604        41
112                                   Ed Sheeran          2480        53
113                                Eddie Cochran          2828       128
114                     Electric Light Orchestra          3749       148
115                              Ella Fitzgerald          3775       163
116                               Ellie Goulding          2414        77
117                                   Elton John          4523       175
118                               Elvis Costello          5013       146
119                                Elvis Presley          3532       168
120                                       Eminem          7450        70
121                               Emmylou Harris          4097       173
122                        Engelbert Humperdinck          2574       125
123                                       Enigma          1283        35
124                             Enrique Iglesias          2429        79
125                                         Enya          1318        40
126                                Eppu Normaali           178         3
127                                      Erasure          3392       138
128                                 Eric Clapton          2957       152
129                                  Erik Santos          1414        40
130                                   Etta James          2393        94
131                                       Europe          1981        82
132                                   Eurythmics          3805       142
133                                  Evanescence          1955        77
134                                    Everclear          3165       123
135                                     Everlast          4987        78
136                                          Exo           417         3
137                                        Exo-K           332         2
138                                      Extreme          2504        70
139                                     Fabolous          8854       115
140                                 Face To Face          1967        93
141                                        Faces          1647        37
142                                   Faith Hill          3088       109
143                                Faith No More          2794        71
144                                        Falco          1154        18
145                                 Fall Out Boy          3694        97
146                                     Fastball          1920        59
147                                  Fatboy Slim          1954        40
148                                Fifth Harmony           946        17
149                                  Fiona Apple          2296        56
150                                Fleetwood Mac          3153       180
151                                     Flo-Rida          4266        59
152                                 Foo Fighters          3425       142
153                                    Foreigner          2403        86
154                                Frank Sinatra          3971       154
155                                  Frank Zappa          5081        99
156                    Frankie Goes To Hollywood          1488        28
157                                Frankie Laine          1682        41
158                                Frankie Valli          1488        48
159                              Freddie Aguilar           756        13
160                                 Freddie King           926        30
161                                         Free          1350        39
162                                    Freestyle          2256        46
163                                         Fun.          1322        18
164                                 Garth Brooks          3026        85
165                                   Gary Numan          3320       170
166                              Gary Valenciano          2381        56
167                                      Genesis          4789       141
168                                George Formby          1431        17
169                              George Harrison          4114       157
170                                 George Jones          3275       137
171                               George Michael          3933       122
172                                George Strait          3849       188
173                                Gino Vannelli          2707        94
174                                  Gipsy Kings           216         5
175                                         Glee          5012       164
176                                Glen Campbell          3288       159
177                               Gloria Estefan          2972       102
178                                Gloria Gaynor          1812        49
179                                          GMB           521        13
180                             Gordon Lightfoot          4700       189
181                          Grand Funk Railroad          2692        89
182                                Grateful Dead          5090       165
183                                       Grease          1656        31
184                                Great Big Sea          3709        93
185                                    Green Day          4587       174
186                                   Gucci Mane          6936        84
187                             Guided By Voices          2840        97
188                                Guns N' Roses          3395        93
189                                    Halloween          1074        18
190                                    Hank Snow          4089       158
191                                Hank Williams          3888       160
192                            Hank Williams Jr.          4848       185
193                                       Hanson          3228       129
194                                Happy Mondays          1220        24
195                              Harry Belafonte          2553        73
196                           Harry Connick, Jr.          3742       145
197                                        Heart          3153       122
198                                    Helloween          4495       162
199                          High School Musical          1130        18
200                                     Hillsong          2377       172
201                              Hillsong United          2517       164
202                                          HIM          2377        98
203                                      Hollies          3663       154
204                                 Hooverphonic          1527        56
205                           Horrible Histories          5102        54
206                                 Housemartins          1163        23
207                                 Howard Jones          3073       102
208                                 Human League          2589        71
209                                   Ian Hunter          3353        77
210                                     Ice Cube          7372        80
211                                 Idina Menzel          1864        33
212                                     Iggy Pop          4981       177
213                                      Il Divo          1129        23
214                              Imagine Dragons          1480        41
215                                        Imago           810        15
216                                    Imperials          1248        27
217                                    Incognito          1920        60
218                                      Incubus          4447       118
219                             Independence Day           796        11
220                        Indiana Bible College          2168        93
221                                 Indigo Girls          5845       184
222                            Ingrid Michaelson          1741        69
223                                         Inna          1001        36
224                           Insane Clown Posse         10406       136
225                                   Inside Out          1063        20
226                                         INXS          3013       140
227                               Iron Butterfly           530        17
228                                  Iron Maiden          5037       156
229                                Irving Berlin          2507        70
230                               Isley Brothers          2561        73
231                                       Israel           728        28
232                              Israel Houghton           928        24
233                                    Iwan Fals          1037        19
234                                       J Cole          5423        68
235                               Jackson Browne          3959       139
236                                 James Taylor          5341       177
237                                 Janis Joplin          3048       106
238                                   Jason Mraz          4184       101
239                               Jennifer Lopez          3866       110
240                                    Jim Croce          2236        66
241                                 Jimi Hendrix          3949       127
242                                Jimmy Buffett          6773       164
243                                  John Denver          4456       168
244                                  John Legend          3259        93
245                                  John Martyn          1825        61
246                               John McDermott          2845        63
247                              John Mellencamp          4479       152
248                                   John Prine          5286       170
249                                   John Waite          2504        86
250                                  Johnny Cash          5167       183
251                                Joni Mitchell          6898       170
252                               Jose Mari Chan           959        19
253 Joseph And The Amazing Technicolor Dreamcoat           706         8
254                                  Josh Groban          2825        85
255                                      Journey          3472       150
256                                 Joy Division          1969        49
257                                 Judas Priest          4830       159
258                                        Judds          2178        71
259                                 Judy Garland          4278       137
260                                Justin Bieber          4403       131
261                            Justin Timberlake          3111        60
262                                   Kanye West          7442       106
263                                    Kari Jobe          1002        38
264                                    Kate Bush          5085       153
265                                   Katy Perry          3422        89
266                                  Keith Green          2467        63
267                                  Keith Urban          3066       110
268                               Kelly Clarkson          3747       157
269                                 Kelly Family          2806        97
270                                Kenny Chesney          4921       173
271                                Kenny Loggins          3891       153
272                                 Kenny Rogers          3947       174
273                                     Kid Rock          5542       100
274                                    Kim Wilde          2093        70
275                                 King Crimson          2248        44
276                                 King Diamond          3822       112
277                                        Kinks          5661       170
278                                Kirk Franklin          3375       111
279                               Kirsty Maccoll          3125       108
280                                         Kiss          3804       183
281                                    Koes Plus           348        10
282                                         Korn          4632       166
283                           Kris Kristofferson          4240       170
284                                         Kyla          1751        46
285                                Kylie Minogue          3790       172
286                                    Lady Gaga          4894       137
287                                 Lana Del Rey          4119       113
288                              Lata Mangeshkar          1432        32
289                                  Lauryn Hill          3505        48
290                                  Lea Salonga          2521        74
291                                  Leann Rimes          3688       158
292                                Lenny Kravitz          3224       156
293                                    Leo Sayer          2562        85
294                                Leonard Cohen          4515       116
295                               Les Miserables          2741        42
296                                    Lil Wayne          9534       125
297                               Linda Ronstadt          2875       150
298                                  Linkin Park          4539       125
299                                Lionel Richie          2793       121
300                                   Little Mix          1635        35
301                                Little Walter           486        13
302                                    LL Cool J         10104       113
303                                   Lloyd Cole          2543        86
304                                        Lorde           928        15
305                                 Loretta Lynn          3153       187
306                                     Lou Reed          5415       164
307                              Louis Armstrong          3680       129
308                                 Louis Jordan          1515        27
309                                   Lucky Dube          2357        92
310                              Luther Vandross          3489       137
311                               Lynyrd Skynyrd          3562       144
312                                      Madonna          3422        88
313                                      Manowar          3188        88
314                                 Mariah Carey          4895       159
315                           Marianne Faithfull          4206       160
316                                    Marillion          6002       149
317                               Marilyn Manson          4738       166
318                                  Mark Ronson          1075        18
319                                     Maroon 5          3056       110
320                                   Mary Black          3480       108
321                                   Matt Monro          1286        41
322                                  Matt Redman          1981        93
323                                   Mazzy Star          1191        46
324                                    Mc Hammer          2004        18
325                                    Meat Loaf          3511        92
326                                     Megadeth          5048       133
327                                  Men At Work          1536        33
328                                    Metallica          5213       155
329                               Michael Bolton          3728       167
330                                Michael Buble          2799       112
331                              Michael Jackson          4928       176
332                             Michael W. Smith          3832       176
333                                        Migos          1826        15
334                                  Miley Cyrus          4288       147
335                                      Misfits          3137       122
336                               Modern Talking          2650       144
337                                  Moody Blues          3583       174
338                                    Morrissey          4399       177
339                                          Mud          1074        26
340                                Nat King Cole          3296       149
341                                 Natalie Cole          4269       155
342                                Natalie Grant          2106        65
343                            Natalie Imbruglia          2047        72
344                                     Nazareth          4401       184
345                                        Ne-Yo          5049       146
346                                 Neil Diamond          3507       173
347                                  Neil Sedaka          3191        97
348                                   Neil Young          4538       185
349                                    New Order          2434        99
350                               Next To Normal           652         9
351                                    Nick Cave          6299       172
352                                   Nick Drake          1975        67
353                                   Nickelback          3094        92
354                                  Nicki Minaj          6033        88
355                                    Nightwish          3115        81
356                                  Nina Simone          3687       158
357                              Nine Inch Nails          2660       108
358                                      Nirvana          2736       103
359                       Nitty Gritty Dirt Band          4090       125
360                                          Noa          3742        90
361                                         NOFX          5011       143
362                                  Norah Jones          2462       105
363                             Notorious B.I.G.          5648        50
364                                       O-Zone           621        12
365                                       O.A.R.          2837        78
366                                        Oasis          3030       149
367                           Ocean Colour Scene          2319       100
368                                    Offspring          3422       119
369                                    Ofra Haza          1994        41
370                                 Oingo Boingo          4222       103
371                                     Old 97's          2504        74
372                                       Oliver          1751        32
373                           Olivia Newton-John          3313       146
374                                    Olly Murs          2037        54
375                                          Omd          2085        78
376                                One Direction          2646        98
377                                  OneRepublic          1408        36
378                                        Opeth          2543        59
379                                Orphaned Land          1219        20
380                            Oscar Hammerstein          1387        20
381                                 Otis Redding          2564       112
382                               Our Lady Peace          2560        99
383                                  Out Of Eden          1947        36
384                                      Outkast          6521        84
385                                     Overkill          5288       135
386                                     Owl City          3453        77
387                                Ozzy Osbourne          4248       157
388                                         P!nk          4566       120
389                                    Passenger          1806        35
390                                  Pat Benatar          2922       106
391                                  Patsy Cline          1708        88
392                                  Patti Smith          4456       104
393                               Paul McCartney          4007       169
394                                   Paul Simon          5289       156
395                                    Pearl Jam          4983       164
396                                   Perry Como          3701       148
397                                Pet Shop Boys          4755       164
398                                 Peter Cetera          1842        75
399                                Peter Gabriel          4043        96
400                                   Peter Tosh          1883        50
401                            Pharrell Williams          3033        30
402                                 Phil Collins          2794       114
403                             Phineas And Ferb          3121        67
404                                        Phish          5511       163
405                                   Pink Floyd          4100       111
406                                      Pitbull          4366        72
407                                Planetshakers          1528       116
408                                       Pogues          4193       100
409                               Point Of Grace          3048       113
410                                       Poison          3198        96
411                                   Pretenders          3163        96
412                                       Primus          3763        83
413                                       Prince          4676       106
414                                  Proclaimers          2521        76
415                                 Procol Harum          2978        86
416                                   Puff Daddy          5972        61
417                                        Q-Tip          2487        17
418                                        Qntal           631         8
419                                     Quarashi          1187        10
420                                 Quarterflash           985        23
421                                        Quasi          1005        25
422                                        Queen          4664       163
423                                Queen Adreena          1366        41
424                                Queen Latifah          3553        50
425                      Queens Of The Stone Age          2234        68
426                                  Queensryche          3465        91
427                                    Quicksand           970        19
428                Quicksilver Messenger Service           811        15
429                                   Quiet Riot          2181        57
430                                   Quietdrive          1429        36
431                                 Quincy Jones          1055        19
432                                  Quincy Punx          1149        18
433                                     R. Kelly          6207       145
434                                    Radiohead          3535       150
435                                        Raffi          1246        36
436                     Rage Against The Machine          3790        55
437                                      Rainbow          2656        64
438                                    Rammstein          1706        44
439                                      Ramones          3630       172
440                                 Randy Travis          4129       177
441                                Rascal Flatts          3266       111
442                                    Ray Boltz          2759        97
443                                  Ray Charles          4110       167
444                                Reba Mcentire          3709       187
445                        Red Hot Chili Peppers          5828       173
446                             Regine Velasquez          3007       101
447                              Religious Music          2920        83
448                                          Rem          1828        42
449                               Reo Speedwagon          3128       122
450                                 Richard Marx          3099       121
451                                  Rick Astley          1544        56
452                                      Rihanna          4689       143
453                              Robbie Williams          5468       166
454                                  Rod Stewart          5118       178
455                               Rolling Stones          4598       179
456                                      Roxette          3384       138
457                                   Roxy Music          2461        64
458                                  Roy Orbison          3402       178
459                                         Rush          5622       175
460                                    Sam Smith           804        20
461                                      Santana          2512       103
462                                Savage Garden          1428        29
463                                    Scorpions          3124       167
464                                        Selah          1552        48
465                                 Selena Gomez          1730        44
466                                          Sia          2262        77
467                                       Side A           588        11
468                                       Slayer          4849       119
469                                       Smiths          2332        67
470                                   Snoop Dogg          6552        71
471                                  Soundgarden          2108        72
472                                  Soundtracks           284         3
473                               Spandau Ballet          1683        42
474                                      Squeeze          4803       146
475                                     Starship          1436        34
476                                   Status Quo          3168       162
477                                   Steely Dan          3203        88
478                            Steve Miller Band          2524       109
479                           Stevie Ray Vaughan           911        27
480                                Stevie Wonder          4099       139
481                                        Sting          3641        90
482                                  Stone Roses          1475        37
483                          Stone Temple Pilots          2038        62
484                                         Styx          3732       121
485                                      Sublime          3110        63
486                                   Supertramp          2487        74
487                             System Of A Down          2481        67
488                                Talking Heads          3298        77
489                                 Taylor Swift          2887        81
490                              Tears For Fears          2442        66
491                              Ten Years After          1235        47
492                                  The Beatles          3595       178
493                                The Broadways          1192        14
494                                      The Jam          3263        86
495                                  The Killers          2969        75
496                                  The Monkees          3523       148
497                                   The Script          1706        32
498                              The Temptations          3684       117
499                                   The Weeknd          1365        29
500                            The White Stripes          2035        64
501                                   Thin Lizzy          3378       109
502                                      Tiffany          1692        49
503                                  Tim Buckley          2079        58
504                                   Tim McGraw          4120       148
505                                  Tina Turner          2913       112
506                                    Tom Jones          3323       141
507                                   Tom Lehrer          2369        23
508                                  Tom T. Hall          4400       160
509                                    Tom Waits          3296        70
510                                         Tool          2099        36
511                                    Tori Amos          3928       110
512                                         Toto          3190       127
513                             Townes Van Zandt          3177        90
514                                Tracy Chapman          2530        82
515                               Tragically Hip          4804       132
516                                        Train          2621        81
517                                       Travis          2236        88
518                            Twenty One Pilots          1981        33
519                                       U-Kiss            82         1
520                                     U. D. O.           721        12
521                                           U2          3956       133
522                                         UB40          4341       140
523                                          Ufo          3022        95
524                                 Ugly Kid Joe          1548        36
525                           Ultramagnetic Mc's          2285        16
526                                     Ultravox          2322        61
527                                Uncle Kracker          2099        42
528                                 Uncle Tupelo          1478        40
529                                    Underoath          1850        46
530                                   Underworld          1941        41
531                                      Unearth          1706        39
532                                         Ungu            96         2
533                                        Unkle          1244        28
534                                      Unknown           329         4
535                                       Unseen          1490        35
536                                Unwritten Law          1531        42
537                                   Uriah Heep          3349       168
538                                         Used          2310        67
539                                        Usher          4623       117
540                                 Utada Hikaru          3064        51
541                                       Utopia          2856        86
542                      Van Der Graaf Generator           780         5
543                                    Van Halen          3684       102
544                                 Van Morrison          3987       150
545                             Vanessa Williams          1971        68
546                                     Vangelis           855        18
547                                  Vanilla Ice          3539        35
548                              Various Artists           230         3
549                           Velvet Underground          1803        48
550                                    Vengaboys           598        13
551                                        Venom          3517        90
552                                    Vera Lynn           508        12
553                             Vertical Horizon          1822        58
554                                  Veruca Salt          1621        48
555                                        Verve          1458        51
556                                   Vince Gill          3064       171
557                               Violent Femmes          2547        83
558                                Virgin Steele          2706        50
559                                Vonda Shepard          1848        61
560                                  Vybz Kartel          2845        37
561                                     W.A.S.P.          3248       112
562                                Walk The Moon           560        11
563                                Wanda Jackson          2703       145
564                                   Wang Chung          1246        28
565                                 Warren Zevon          3490       105
566                                    Waterboys          2730        72
567                              Waylon Jennings          3240       152
568                                         Ween          3168        97
569                                       Weezer          2845        89
570                            Weird Al Yankovic          6422       106
571                                     Westlife          2936       134
572                                  Wet Wet Wet          1766        58
573                                        Wham!          1296        21
574                                  Whiskeytown          1584        53
575                                   Whitesnake          2515       103
576                              Whitney Houston          2771        93
577                                          Who          4980       163
578                             Widespread Panic          3423       100
579                                   Will Smith          3473        30
580                                Willie Nelson          3253       148
581                              Wilson Phillips          1258        32
582                               Wilson Pickett          1034        24
583                                 Wishbone Ash          2910       102
584                            Within Temptation          2124        71
585                                  Wiz Khalifa          3264        40
586                                 Wu-Tang Clan          7684        53
587                                  Wyclef Jean          3975        45
588                                            X           978        18
589                                      X Japan           869        12
590                                     X-Raided          1286         7
591                                   X-Ray Spex           915        22
592                                      X-Treme            38         1
593                                      Xandria          1021        25
594                                Xavier Naidoo           373         3
595                                  Xavier Rudd          1270        40
596                                      Xentrix           568         9
597                                      Xiu Xiu          1021        25
598                                       Xscape          1595        37
599                                          XTC          4996       144
600                                       Xzibit          5684        47
601                                        Yazoo           991        23
602                              Yeah Yeah Yeahs          1468        50
603                                     Yelawolf          2080        14
604                                        Yello          1967        57
605                                   Yellowcard          2023        72
606                             Yeng Constantino           613         8
607                                          Yes          3854       108
608                                           YG          1431        12
609                              Ying Yang Twins          3334        35
610                             Yngwie Malmsteen          2651       106
611                                     Yo Gotti          2539        22
612                                  Yo La Tengo          1541        47
613                                     Yoko Ono          2762        85
614                                Yolanda Adams          1858        48
615                  Yonder Mountain String Band           646        10
616                                     You Am I          2224        54
617                                   Young Buck          2495        17
618                                    Young Dro          1290         7
619                                  Young Jeezy          4877        57
620                                  Youngbloodz          2518        19
621                               Youth Of Today           864        22
622                                     Yukmouth          1612        11
623                                     Yung Joc          1577         9
624                                  Yusuf Islam           739        15
625                                         Z-Ro          5242        54
626                               Zac Brown Band          1531        31
627                                   Zakk Wylde           915        21
628                                          Zao          1305        30
629                                   Zayn Malik           780        15
630                                        Zazie           196         2
631                                        Zebra           796        20
632                                    Zebrahead          3337        76
633                                          Zed            84         1
634                                       Zero 7           949        24
635                                   Zeromancer           888        30
636                                 Ziggy Marley          2462        64
637                                          Zoe            69         1
638                                      Zoegirl          1595        38
639                                       Zornik           460        12
640                                          Zox          1019        21
641                                     Zucchero          1343        30
642                                         Zwan           685        14
643                                       ZZ Top          3765       132
library(plotly)

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
# Plot interactive scatter plot
plot_ly(unique_words, x = ~num_songs, y = ~text_combined, text = ~artist, type = 'scatter', mode = 'markers') %>%
  layout(title = "Number of Songs Written vs. Number of Unique Words Used",
         xaxis = list(title = "Number of Songs Written"),
         yaxis = list(title = "Number of Unique Words Used"))

The code below performs an analysis on a data set containing information about songs, specifically focusing on the vocabulary richness of different artists. Initially, it calculates the weighted average vocabulary size for each artist, taking into account the number of songs they’ve written. Artists with more songs are given more weight in the calculation, providing a more balanced comparison. The code then identifies the top 5 artists with the largest and smallest weighted average vocabulary sizes and plots them side by side using ggplot. These graphs offer insights into the diversity of vocabulary usage among artists. Contrary to the original instance where artists were solely ranked based on the number of unique words they used, this approach adjusts for the disparity in the number of songs written by each artist, ensuring a fairer comparison. Consequently, the graphs showcase artists not only with the largest or smallest vocabularies but also consider their productivity in terms of songwriting. This nuanced analysis provides a deeper understanding of the linguistic diversity exhibited by artists relative to their output.

# Calculate the total number of words written by each artist
total_words <- aggregate(text_combined ~ artist, data = all_artists_df, FUN = function(x) length(unlist(strsplit(x, "\\s+"))))

# Merge total words with vocabulary dataframe
vocabulary_df <- merge(vocabulary_df, total_words, by = "artist", all.x = TRUE)

# Calculate weighted average vocabulary size, handling division by zero
vocabulary_df$weighted_avg_vocabulary <- ifelse(vocabulary_df$text_combined == 0, 0, vocabulary_df$vocabulary_size / vocabulary_df$text_combined)

# Sort dataframe by weighted average vocabulary size
vocabulary_df <- vocabulary_df[order(vocabulary_df$weighted_avg_vocabulary, decreasing = TRUE), ]

# Create a subset dataframe for the 5 artists with the largest adjusted vocabulary
top5_largest_adjusted <- head(vocabulary_df, 5)

# Create a subset dataframe for the 5 artists with the smallest adjusted vocabulary
top5_smallest_adjusted <- tail(vocabulary_df, 5)

# Combine the top 5 largest and smallest adjusted dataframes
combined_adjusted_df <- rbind(top5_largest_adjusted, top5_smallest_adjusted)

# Add a column to indicate whether the artist has the largest or smallest adjusted vocabulary
combined_adjusted_df$category <- ifelse(combined_adjusted_df$artist %in% top5_largest_adjusted$artist, "Largest", "Smallest")

# Reorder the artists within each category
combined_adjusted_df$artist <- factor(combined_adjusted_df$artist, levels = combined_adjusted_df$artist[order(combined_adjusted_df$category, combined_adjusted_df$weighted_avg_vocabulary)])

# Plot the combined adjusted data
ggplot(combined_adjusted_df, aes(x = reorder(artist, weighted_avg_vocabulary), y = weighted_avg_vocabulary)) +
  geom_bar(stat = "identity", aes(fill = category)) +
  facet_wrap(~ category, scales = "free", nrow = 1) +
  labs(title = "Top 5 Artists with Largest and Smallest Adjusted Vocabulary",
       x = "Artist",
       y = "Weighted Average Vocabulary Size",
       fill = "Category") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Question 4: Which artists tend to use the most profanity?

We can gauge profanity in song lyrics by tallying profane instances and normalize them for fair assessment.

# Assuming you have a profanity list and a function for tallying profane instances, replace `profanity_tally_function` with the actual function
# profanity_counts <- profanity_tally_function(all_words)

# Plotting profanity trends across artists, genres, or time periods
# Example:
# ggplot(profanity_counts, aes(x = artist, y = profanity_count, fill = genre)) +
#   geom_bar(stat = "identity") +
#   labs(title = "Profanity Usage Across Artists and Genres",
#        x = "Artist",
#        y = "Profanity Count") +
#   theme(axis.text.x = element_text(angle = 45, hjust = 1))